Speeding Up Speaker Diarization by Using Prosodic Features

نویسندگان

  • Yan Huang
  • Gerald Friedland
  • Christian Müller
  • Nikki Mirghafori
  • Gerald
چکیده

In this article we present a method to speed up agglomerative clustering used in speaker diarization by using long-term prosodic features. A set of these features is used to decide which clusters should be merged. This strategy reduces the number of decisions that have to be performed using the more calculation-intensive method based on the Bayesian Information Criterion (BIC). We show a speedup of 30 % to a state-of-the-art diarization system. This work was partly funded by DTO VACE contract number NBCHC060157. Gerald Friedland and Christian Müller were supported by a fellowship within the postdoc program of the German Academic Exchange Service (DAAD). Speeding Up Speaker Diarization by Using Prosodic Features Yan Huang,Gerald Friedland,Christian Müller, Nikki Mirghafori International Computer Science Institute, Berkeley Department of Computer Science, University of California, Berkeley {yan,fractor,cmueller,nikki}@icsi.berkeley.edu

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prosodic and Phonetic Features for Speaker Clustering in Speaker Diarization Systems

This work is focused on speaker clustering methods that are used in speaker diarization systems. The purpose of speaker clustering is to associate together segments that belong to the same speaker and is usually applied in the last stage of the speaker-diarization process. We concentrate on developing proper representations of speaker segments for clustering. We realize two different speaker cl...

متن کامل

Using voice-quality measurements with prosodic and spectral features for speaker diarization

Jitter and shimmer voice-quality measurements have been successfully used to detect voice pathologies and classify different speaking styles. In this paper, we investigate the usefulness of jitter and shimmer voice measurements in the framework of the speaker diarization task. The combination of jitter and shimmer voice-quality features with the long-term prosodic and shortterm spectral feature...

متن کامل

The Detection of Overlapping Speech with Prosodic Features for Speaker Diarization

Overlapping speech is responsible for a certain amount of errors produced by standard speaker diarization systems in meeting environment. We are investigating a set of prosody-based long-term features as a potential complement to our overlap detection system relying on short-term spectral parameters. The most relevant features are selected in a two-step process. They are firstly evaluated and s...

متن کامل

Detection and Handling of Overlapping Speech for Speaker Diarization

This thesis concerns the detection of overlapping speech segments and its further application for the improvement of speaker diarization performance. We propose the use of three spatial cross-correlationbased parameters for overlap detection on distant microphone channel data. Spatial features from di↵erent microphone pairs are fused by means of principal component analysis or by an approach in...

متن کامل

Developing On-Line Speaker Diarization System

In this paper we describe the process of converting a research prototype system for Speaker Diarization into a fully deployed product running in real time and with low latency. The deployment is a part of the IBM Cloud Speech-to-Text (STT) Service. First, the prototype system is described and the requirements for the on-line, deployable system are introduced. Then we describe the technical appr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014